Theory and Practice of Adaptive Filters
November 16, 2022 (GHC 8102)

Abstract: A filter is a succinct data structure that stores an approximate version of a set (a Bloom filter is the best-known example). Classic, widely-used filters achieve an optimal tradeoff between the error rate of the filter and the space it requires.

However, classic filter analysis is somewhat limited: it gives the error performance on a single query. If there is a false positive error on one element, then repeated queries to that element will always result in a false positive. This leads to several issues. First, this means that an adversary can repeat false positive queries, leading to arbitrarily poor filter performance; this is a problem from a security standpoint. Second, the single-query analysis leaves performance on the table due to repeated queries. Many practical datasets have elements that are queried many times; reducing the false positive rate on these repeated queries can lead to improved performance.

In this talk I will discuss several recent data structures that close this gap, fixing false positives so that subsequent queries to these fixed elements are guaranteed to be answered correctly. We will look at upper and lower bounds from a theoretical standpoint, as well as discussing opportunities to make these theoretical ideas implementable in practice.